60 research outputs found

    Babel Treebank of Public Messages in Croatian

    Get PDF
    AbstractThe paper presents the process of constructing a publicly available treebank of public messages written in Croatian. The messages were collected from various electronic sources – e-mail, blog, Facebook and SMS – and published on the Zagreb Museum of Contemporary Art LED facade within the Babel art project. The project aimed to use the facade as an open-space blog or social interface for enabling citizens to publicly express their views. Construction and current state of the treebank is presented along with future work plans. A comparison of Babel Treebank with Croatian Dependency Treebank and SETimes.HR treebank regarding differing domains and annotation schemes is briefly sketched. The treebank is used as a test platform for introducing a new standard for syntactic annotation of Croatian texts. An experiment with morphosyntactic tagging and dependency parsing of the treebank is conducted, providing first insight to computational processing of non-standard text in Croatian

    An Experiment in Verb Valency Frame Extraction from Croatian Dependency Treebank

    Get PDF
    The paper presents an approach to semi-automatic verb valency frame extraction from the Croatian Dependency Treebank. Our algorithm extracted 1923 verb valency frames for 594 different verbs. We discuss applicability of our method to semi-automatic verb valency lexicon creation and refinement, along with possibilities of utilizing it in the task of parsing Croatian texts

    Tagset Reductions in Morphosyntactic Tagging of Croatian Texts

    Get PDF
    Morphosyntactic tagging of Croatian texts is performed with stochastic taggersby using a language model built on a manually annotated corpus implementingthe Multext East version 3 specifications for Croatian. Tagging accuracy in thisframework is basically predefined, i.e. proportionally dependent of two things:the size of the training corpus and the number of different morphosyntactic tagsencompassed by that corpus. Being that the 100 kw Croatia Weekly newspapercorpus by definition makes a rather small language model in terms of stochastictagging of free domain texts, the paper presents an approach dealing withtagset reductions. Several meaningful subsets of the Croatian Multext-East version3 morphosyntactic tagset specifications are created and applied on Croatiantexts with the CroTag stochastic tagger, measuring overall tagging accuracyand F1-measures. Obtained results are discussed in terms of applying differentreductions in different natural language processing systems and specifictasks defined by specific user requirements

    hr500k – A Reference Training Corpus of Croatian.

    Get PDF
    In this paper we present hr500k, a Croatian reference training corpus of 500 thousand tokens, segmented at document, sentence and word level, and annotated for morphosyntax, lemmas, dependency syntax, named entities, and semantic roles. We present each annotation layer via basic label statistics and describe the final encoding of the resource in CoNLL and TEI formats. We also give a description of the rather turbulent history of the resource and give insights into the topic and genre distribution in the corpus. Finally, we discuss further enrichments of the corpus with additional layers, which are already underway

    Cross-lingual Dependency Parsing of Related Languages with Rich Morphosyntactic Tagsets

    Get PDF
    This paper addresses cross-lingual dependency parsing using rich morphosyntactic tagsets. In our case study, we experiment with three related Slavic languages: Croatian, Serbian and Slovene. Four different dependency treebanks are used for monolingual parsing, direct cross-lingual parsing, and a recently introduced crosslingual parsing approach that utilizes statistical machine translation and annotation projection. We argue for the benefits of using rich morphosyntactic tagsets in cross-lingual parsing and empirically support the claim by showing large improvements over an impoverished common feature representation in form of a reduced part-of-speech tagset. In the process, we improve over the previous state-of-the-art scores in dependency parsing for all three languages.Published versio

    PHYTOPHILOUS FAUNA OF A SMALL AND ARTIFICIAL URBAN LAKE

    Get PDF
    Fitofilna zajednica na Myriophyllum spicatum proučavana je u malom umjetnom jezeru u gradu Osijeku (istočna Hrvatska) tijekom proljetne i ljetne sezone 2010. godine. U eutrofnim uvjetima makrofitni su bili dobro razvijeni, a na formiranom perifitonu zabilježeni su predstavnici slijedećih vrsta beskralješnjaka: Hidrozoa, Nematoda, Gastropoda, Cladocera, Copepoda, larve Insecta - uključujući i obitelji Chironomidae i Coleoptera. Pokazivali su razlike u vremenskim oblicima pojavljivanja. Zabilježili smo dvije zasebne faze u kolonizaciji makrofita s razlikama u sastavu i obilju beskralježnjaka. Larve insekata, osobito Chironomidae-a, bili su najrasprostranjeniji u prvoj fazi, tijekom proljetnog razdoblja, a Hydra oligactis (smeđa hidra) bila je u izobilju u drugoj fazi, tj. ljetnom razdoblju. Istodobno, obilje mikrorakušaca opadao je prema kraju ljeta. Rezultati analiza pokazali su da su temperatura vode i perifitonska biomasa bile varijable koje su imale glavni utjecaj na sastav beskralježnjaka, a zanimljivo je da su makrofitska veličina i biomasa negativno povezani s obiljem faune. S druge strane, smeđa hidra negativno je bila povezana sa svim ostalim beskralježnjačkim svojstvima, osim gastropoda. Veća površina uronjenih makrofita glavni je parametar koji pomaže povećanju obilja beskralješnjaka zbog osiguranja zaštite od grabežljivaca i rasta perifitona, važnog izvora hrane za ove fitofilne organizme. Duljina makrofita bila je pozitivno povezana s bogatstvom hidre, dok su Chironomidi bili više pod utjecajem perifitonske biomase. Ovi organizmi mogu ukazivati na kakvoću vode i potencijalno povećanje primarne i sekundarne proizvodnjePhytophilous community on Myriophyllum spicatum was studied in a small artificial urban lake in the city of Osijek (eastern Croatia), during the spring and summer season in 2010. In the eutrophic conditions, macrophyte stands were well developed and in the formed periphyton representatives of the following invertebrate taxa were found: Hydrozoa, Nematoda, Gastropoda, Cladocera, Copepoda, Insecta larvae - including families Chironomidae and Coleoptera. They displayed differences in temporal abundance patterns. Two separate phases in macrophyte colonization with differences in invertebrate composition and abundance were recorded. Insect larvae, particularly Chironomidae, were most abundant in the first phase, through the spring period, and Hydra oligactis (brown hydra) was most abundant in the second phase, i.e. summer period. Concurrently, microcrustacean abundance declined towards the end of the summer. Results of the analyses indicated that water temperature and perihyton biomass were the variables exerting the main influence on the invertebrate assemblage, while interestingly, macrophyte size and biomass were negatively correlated with most of the fauna abundance. On the other hand, brown hydra was negatively correlated with all other invertebrate taxa, except gastropods. Larger surface of submersed macrophytes is the main parameter supporting the increase of invertebrate abundance due to providing protection from predators and growth for periphyton, an important food source for these phytophilous organisms. Macrophyte length was positively correlated with Hydra abundance, while Chironomids were more influenced by periphyton biomass. These organisms can indicate water quality conditions and a potential increase in primary and secondary production
    corecore